IndentationError: expected an indented block after 'if' statement on line 1 (2406224367.py, line 2)
Linking python to basic data science”
2023-08-08
In this course, we will show you basic syntax of python and some usefull packages regard to data science.
Learn more: TIOBE
For exammple, output Hello, World! in C# and Python.
Strong community support and a vast number of libraries (137,000)
There is a joke: Life is short, you need Python.
One biggest difference of syntax between Python and other languages is that:
🙅♂️
One biggest difference of syntax between Python and other languages is that:
🙆♂️
One biggest difference of syntax between Python and other languages is that:
How about we remove the indentation of above code:
In Python, there are several built-in data types for variables. Here is a table listing some of the common variable types in Python:
| Variable Type | Description | Example |
|---|---|---|
| int | Integer numbers without decimal points | x = 1 |
| float | Floating-point numbers with decimal points | y = 3.1415926 |
| str | Strings (sequences of characters) | name = “Teemo” |
| bool | Boolean values (True or False) | flag = True |
| list | Ordered, mutable collection of elements | numbers = [1, 2, 3] |
| tuple | Ordered, immutable collection of elements | coordinates = (10, 20) |
| set | Unordered, mutable collection of unique elements | unique_numbers = {1, 2, 3} |
| dict | Collection of key-value pairs | person = {‘name’: ‘Alice’, ‘age’: 30} |
| NoneType | Represents the absence of a value | no_value = None |
a = 1
b = 3.1
c = "Teemo"
# or you can assign variables just like follow:
a,b,c = 1,3.1,"Teemo"
print(a)
print(b)
print(c)1
3.1
Teemo
'11'
For example:
For example:
| Method | Description |
|---|---|
append(item) |
Add an element item to the end of the list. |
extend(iterable) |
Extend the list by appending elements from iterable. |
insert(index, item) |
Insert item at the specified index. |
remove(item) |
Remove the first occurrence of item from the list. |
pop(index=-1) |
Remove and return the element at index. |
clear() |
Remove all elements from the list. |
index(item, start, end) |
Return the index of the first occurrence of item. |
count(item) |
Return the number of occurrences of item. |
sort(key, reverse) |
Sort the list in ascending or descending order. |
reverse() |
Reverse the order of elements in the list. |
copy() |
Create a shallow copy of the list. |
To get keys or values of a dict:
| Operation | Operator | Example | Result |
|---|---|---|---|
| Addition | + | 5 + 3 | 8 |
| Subtraction | - | 7 - 2 | 5 |
| Multiplication | * | 4 * 6 | 24 |
| Division | / | 10 / 2 | 5.0 |
| Floor Division | // | 10 // 3 | 3 |
| Exponentiation | ** | 2 ** 3 | 8 |
| Modulus | % | 10 % 3 | 1 |
| Condition | Mathematical Expression | Description |
|---|---|---|
| Equal | a == b | True if a is equal to b; otherwise, False. |
| Not Equal | a != b | True if a is not equal to b; otherwise, False. |
| Greater Than | a > b | True if a is greater than b; otherwise, False. |
| Less Than | a < b | True if a is less than b; otherwise, False. |
| Greater Than or Equal | a >= b | True if a is greater than or equal to b; otherwise, False. |
| Less Than or Equal | a <= b | True if a is less than or equal to b; otherwise, False. |
And usually together with if statement
If we want to try multi-conditions, elis and else keywords are needed:
for loopwhile loopfor and while loops can be linked to:
break statement
continue statement
else statement
for loopnumbers = [1,2,3,4]
for x in numbers:
if x == 1:
continue
if x == 3:
continue
print(x)
else:
print("loos is over")2
4
loos is over
with break:
while loopbreak and continuedef is used to define functionhello world!!
Numpy is an important module for scientific calculation
import numpy as np
# 1-dimensional array
arr1d = np.array([1, 2, 3, 4, 5])
# 2-dimensional array
arr2d = np.array([[1, 2, 3], [4, 5, 6]])import numpy as np
# Create a 1-dimensional NumPy array
arr1d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Slicing: Get elements from index 2 to 5 (exclusive)
sliced_arr1d = arr1d[2:5]
print(sliced_arr1d) # Output: [2 3 4]
# Slicing: Get all elements from index 4 to the end
sliced_arr1d = arr1d[4:]
print(sliced_arr1d) # Output: [4 5 6 7 8 9]
# Slicing: Get elements up to index 3 (exclusive)
sliced_arr1d = arr1d[:3]
print(sliced_arr1d) # Output: [0 1 2]
# Slicing: Get elements from
# index -5 (5th element from the end) to the end
sliced_arr1d = arr1d[-5:]
print(sliced_arr1d) # Output: [5 6 7 8 9]# Create a 2-dimensional NumPy array
arr2d = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
# Slicing: Get the first two rows
sliced_arr2d = arr2d[:2, :]
print(sliced_arr2d)
# Output:
# [[0 1 2]
# [3 4 5]]
# Slicing: Get the last two columns
sliced_arr2d = arr2d[:, -2:]
print(sliced_arr2d)
# Output:
# [[1 2]
# [4 5]
# [7 8]]
# Slicing: Get a sub-matrix from the original array
sliced_arr2d = arr2d[1:, 1:]
print(sliced_arr2d)
# Output:
# [[4 5]
# [7 8]]import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(f"origin: \n {arr}")
# Reshape the array to (3, 2)
reshaped_arr = arr.reshape((3, 2))
print(f"reshaped: \n {reshaped_arr}")
# Transpose the array
transposed_arr = arr.T
print(f"transposed: \n {transposed_arr}")origin:
[[1 2 3]
[4 5 6]]
reshaped:
[[1 2]
[3 4]
[5 6]]
transposed:
[[1 4]
[2 5]
[3 6]]
For example:
import numpy as np
import plotly.graph_objects as go
from scipy import interpolate
# Sample data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 1, 6, 3])
# Create an interpolation function
interp_func = interpolate.interp1d(x, y, kind='linear')
interp_func1 = interpolate.interp1d(x, y, kind='cubic')
# Define points where we want to estimate values
x_new = np.linspace(1, 5, num=100) # Generate 100 points between 1 and 5
# Perform linear & cubic interpolation to estimate
# y-values for new x-values
y_new = interp_func(x_new)
y_new1 = interp_func1(x_new)
# Plot the original data and the interpolated values
fig = go.Figure()
fig.add_trace(go.Scatter(x = x,y = y,
mode='markers', name='Origibal data'))
fig.add_trace(go.Scatter(x = x_new,y = y_new,
mode='lines', name='Linear interpolation'))
fig.add_trace(go.Scatter(x = x_new,y = y_new1,
mode='lines', name='Cubic interpolation'))
fig.show()import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Sample data points
x = np.array([1, 2, 3, 4, 5])
y = np.array([2.1, 3.9, 7.1, 8.8, 12.3])
# Define the function to fit (in this example, we'll use a simple quadratic function)
def quadratic_func(x, a, b, c):
return a * x**2 + b * x + c
# Perform curve fitting
params, cov_matrix = curve_fit(quadratic_func, x, y)
# Get the fitted parameters
a_fit, b_fit, c_fit = params
# Generate new x values for plotting the fitted curve
new_x = np.linspace(1, 5, 100)
# Calculate the y values for the fitted curve
fitted_y = quadratic_func(new_x, a_fit, b_fit, c_fit)
trace_data = go.Scatter(x=x, y=y, mode='markers', name='Original Data')
# Create the trace for the fitted curve
trace_fit = go.Scatter(x=new_x, y=fitted_y, mode='lines', name='Fitted Curve')
# Create the layout for the plot
layout = go.Layout(title='Curve Fitting with Scipy', xaxis=dict(title='x'), yaxis=dict(title='y'))
# Create the figure
fig = go.Figure(data=[trace_data, trace_fit], layout=layout)
# Show the figure
fig.show()For example:
| Data Source | Pandas Function | Example |
|---|---|---|
| CSV | pd.read_csv() |
pd.read_csv('data.csv') |
| TXT | pd.read_csv() |
pd.read_csv('data.txt', delimiter='\t') |
| DAT | pd.read_csv() |
pd.read_csv('data.dat', delimiter=' ') |
| Excel | pd.read_excel() |
pd.read_excel('data.xlsx') |
| JSON | pd.read_json() |
pd.read_json('data.json') |
| SQL Database | pd.read_sql() |
pd.read_sql('SELECT * FROM table_name', connection) |
| Clipboard | pd.read_clipboard() |
pd.read_clipboard() |
| URL | pd.read_csv() |
pd.read_csv('https://example.com/data.csv') |
| HTML | pd.read_html() |
pd.read_html('https://example.com/table.html') |
with open('data.txt', 'r') as file:
# Skip the header line
next(file)
# Initialize an empty list to store data
data = []
# Read and process each line in the file
for line in file:
line = line.strip() # Remove leading/trailing whitespaces
col1, col2, col3 = line.split(',') # Split the line using comma as the delimiter
data.append({'col1': col1, 'col2': col2, 'col3': col3})
# Create a DataFrame from the data list
df = pd.DataFrame(data)print(df.head()) # View the first few rows
print(df.info()) # Summary of DataFrame information
print(df.describe()) # Summary statistics
print(df.shape) # Number of rows and columns # Drop rows with missing values
f.dropna()
# Fill missing values with a specific value (e.g., 0)
df.fillna(0)
# Forward fill missing values with the previous value
df.fillna(method='ffill')df1 = pd.DataFrame({'ID': [1, 2, 3],
'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [2, 3, 4],
'City': ['New York', 'Los Angeles', 'Chicago']})
merged_df = pd.merge(df1, df2, on='ID')
merged_df| ID | Name | City | |
|---|---|---|---|
| 0 | 2 | Bob | New York |
| 1 | 3 | Charlie | Los Angeles |
Learn more: Pandas
to provide a convenient and efficient way to work with geospatial data in Python - Load *.shp file
ARCID GRID_CODE FROM_NODE TO_NODE \
0 1 1 4 1
1 2 1 4 2
2 3 1 4 6
3 4 1 6 3
4 5 1 5 7
geometry
0 LINESTRING (19.79398 64.29149, 19.79379 64.291...
1 LINESTRING (19.79398 64.29149, 19.79421 64.291...
2 LINESTRING (19.79398 64.29149, 19.79427 64.291...
3 LINESTRING (19.79486 64.29056, 19.79496 64.290...
4 LINESTRING (19.78224 64.29183, 19.78235 64.291...
| Function/Method | Description |
|---|---|
gpd.read_file() |
Load geospatial data from file (e.g., shapefile, GeoJSON, GeoPackage). |
GeoDataFrame() |
Create a GeoDataFrame from a DataFrame with geometry column or other geospatial data. |
gdf.head() |
Display the first few rows of the GeoDataFrame. |
gdf.plot() |
Plot the GeoDataFrame using Matplotlib. |
gdf.geometry |
Access the geometry column of the GeoDataFrame. |
gdf.crs |
Get or set the coordinate reference system (CRS) of the GeoDataFrame. |
gdf.to_crs() |
Reproject (transform) the GeoDataFrame to a new CRS. |
gdf.bounds |
Calculate the bounding box of the GeoDataFrame. |
gdf.buffer() |
Create a buffer around the geometries in the GeoDataFrame. |
gdf.intersection() |
Perform spatial intersection between two GeoDataFrames. |
gdf.dissolve() |
Merge geometries based on a common attribute value. |
gpd.overlay() |
Perform spatial overlay operations (intersection, union, difference, etc.) between GeoDataFrames. |
gdf.cx[] |
Spatial filtering based on a bounding box. |
| Function/Method | Description |
|---|---|
gdf.intersects() |
Check if geometries intersect with a specific geometry. |
gpd.sjoin() |
Perform a spatial join between two GeoDataFrames based on their spatial relationship. |
gdf.area |
Calculate the area of geometries in the GeoDataFrame. |
gdf.length |
Calculate the length of geometries (lines) in the GeoDataFrame. |
gdf.centroid |
Get the centroid point of geometries in the GeoDataFrame. |
gdf.to_file() |
Save the GeoDataFrame to a file (e.g., shapefile, GeoJSON, GeoPackage). |
| Library | Official Documentation |
|---|---|
| NumPy | https://numpy.org/doc/ |
| SciPy | https://docs.scipy.org/doc/scipy/reference/ |
| Pandas | https://pandas.pydata.org/docs/ |
| GeoPandas | https://geopandas.org/en/stable/ |
| Matplotlib | https://matplotlib.org/stable/ |
Thanks 😊
material can be found in my github
liangch@uef.fi